Skip to content

Comments

fix: unify ordering display with optimization path#20362

Open
adriangb wants to merge 7 commits intoapache:mainfrom
pydantic:fix-complex-projection-ordering
Open

fix: unify ordering display with optimization path#20362
adriangb wants to merge 7 commits intoapache:mainfrom
pydantic:fix-complex-projection-ordering

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Feb 15, 2026

Summary

Unify the ordering display path with the optimization path so EXPLAIN output always matches what the optimizer sees.

FileScanConfig previously had two independent paths computing orderings:

  1. Optimization (eq_properties()): validates orderings at table-schema level via validated_output_ordering(), then projects through EquivalenceProperties::project().
  2. Display (fmt_as()): independently recomputed via get_projected_output_ordering(), which validated post-projection and could disagree with path 1.

The display path dropped valid orderings when any projection expression was complex (e.g. a + 1), even if the ordering column itself was a simple column reference. This PR replaces the display computation with eq_properties().oeq_class(), the same orderings the optimizer uses.

Changes

  • Replace get_projected_output_ordering() calls in both DataSource::fmt_as and DisplayAs::fmt_as with self.eq_properties().oeq_class()
  • Delete get_projected_output_ordering and resolve_sort_column_projection (no longer needed)
  • Add 3 regression tests:
    • test_display_ordering_with_complex_projection_multi_file — complex projections no longer drop valid orderings
    • test_display_ordering_dropped_for_overlapping_stats — overlapping file stats correctly suppress orderings
    • test_display_ordering_matches_eq_properties — display and optimization paths agree
  • Update SLT expectations to reflect equivalence-aware ordering display (e.g. simplified orderings when filter constants are present, additional equivalent orderings from monotonic projections like CAST)

Test plan

  • cargo test -p datafusion-datasource (100 tests pass)
  • SLT tests updated and passing: sort_pushdown, union, window, monotonic_projection_test, topk, group_by, joins

🤖 Generated with Claude Code

@github-actions github-actions bot added the datasource Changes to the datasource crate label Feb 15, 2026
@adriangb adriangb changed the title fix: handle complex projections in ordering validation fix: unify ordering display with optimization path Feb 15, 2026
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 15, 2026
@adriangb adriangb requested a review from zhuqi-lucas February 15, 2026 16:28
@adriangb adriangb marked this pull request as ready for review February 16, 2026 00:16
@adriangb
Copy link
Contributor Author

@zhuqi-lucas could you review this change please?

adriangb and others added 5 commits February 22, 2026 10:46
Previously, `get_projected_output_ordering` used
`ordered_column_indices_from_projection` which was all-or-nothing: if any
expression in the projection wasn't a simple Column, it returned None for
the entire projection — even if the sort columns themselves were simple
column refs.

Replace it with `resolve_sort_column_projection` which only requires
sort-column positions to resolve to simple Columns. Each ordering is now
independently evaluated: orderings on simple column refs get validated
with statistics even when other projection expressions are complex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the independent display computation (get_projected_output_ordering)
with orderings extracted from eq_properties().oeq_class(), so EXPLAIN
output always matches what the optimizer actually sees.

Previously, fmt_as() independently recomputed orderings via
get_projected_output_ordering(), which validated post-projection and
would drop valid orderings when any projection expression was complex
(e.g. `a + 1`). Now both display and optimization use the same path:
validate at table-schema level, then project through
EquivalenceProperties::project().

- Delete get_projected_output_ordering and resolve_sort_column_projection
- Update DataSource::fmt_as and DisplayAs::fmt_as to use eq_properties()
- Add regression tests for complex projections with multi-file groups
- Update SLT expectations for equivalence-aware ordering display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The partition/file ordering diagrams from the deleted
get_projected_output_ordering are useful context for understanding
why we validate orderings against file statistics. Move them to
validated_output_ordering where they belong.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@adriangb adriangb force-pushed the fix-complex-projection-ordering branch from c830e2c to 73af179 Compare February 22, 2026 10:46
@adriangb
Copy link
Contributor Author

Hi @zhuqi-lucas quick bump 😄

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns DataSourceExec ordering display with the optimizer’s ordering/equivalence analysis so EXPLAIN reflects the same orderings the optimizer uses, avoiding mismatches for complex projections and equivalence-derived orderings.

Changes:

  • Switch FileScanConfig display to use eq_properties().oeq_class() rather than recomputing projected orderings for formatting.
  • Remove the old display-only projected-ordering computation helpers.
  • Update SLT expectations to reflect equivalence-aware ordering display (including multiple equivalent orderings) and add regression tests for the unified behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
datafusion/datasource/src/file_scan_config.rs Unifies displayed orderings with optimizer orderings via oeq_class(), removes old display path helpers, and adds regression tests.
datafusion/sqllogictest/test_files/window.slt Updates expected DataSourceExec ordering display (including multiple orderings).
datafusion/sqllogictest/test_files/union.slt Updates ordering display expectations for union inputs.
datafusion/sqllogictest/test_files/topk.slt Updates TopK-related ordering display expectations to show equivalence-derived orderings.
datafusion/sqllogictest/test_files/sort_pushdown.slt Updates sort-pushdown ordering display to match equivalence-aware output ordering.
datafusion/sqllogictest/test_files/monotonic_projection_test.slt Updates monotonic-projection ordering expectations to include additional equivalent orderings.
datafusion/sqllogictest/test_files/joins.slt Updates join projection pushdown ordering display to include equivalent ordering on projected expression.
datafusion/sqllogictest/test_files/group_by.slt Updates ordering display expectations in group-by plans to match equivalence-aware ordering sets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas could you review this change please?

Hi @adriangb , sorry i am out for Chinese New Year Holiday, i will review this when i am back.

@adriangb
Copy link
Contributor Author

@zhuqi-lucas could you review this change please?

Hi @adriangb , sorry i am out for Chinese New Year Holiday, i will review this when i am back.

Sorry to ping you on holiday then, no rush just thought it might have slipped through the cracks!

zhuqi-lucas pushed a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request Feb 24, 2026
Comprehensive review of the PR that fixes display/optimizer ordering
disagreement in FileScanConfig by replacing the independent display
path with eq_properties().oeq_class().

https://claude.ai/code/session_01HerFnFzGc7s4AQppknpup3
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb for the work on unifying the display and optimization paths!

I noticed several changes in the EXPLAIN output that seem like regressions from a user-facing perspective:

  1. The order of multiple orderings changed

In many tests (e.g., group_by.slt, window.slt):

Before: output_orderings=[[a@1 ASC, b@2 ASC], [c@3 ASC]]
After: output_orderings=[[c@3 ASC], [a@1 ASC, b@2 ASC]]

The original ordering list followed the table's declared sort order, which was intuitive. Now it follows the internal projected_orderings() generation order from the dependency map, which is less predictable for users reading EXPLAIN output. I am not sure if it's right behaviour?

  1. Filter-constant columns are stripped

In sort_pushdown.slt:

Before: output_ordering=[timeframe@0 ASC NULLS LAST, period_end@1 ASC NULLS LAST]
After: output_ordering=[period_end@1 ASC NULLS LAST]

The physical file ordering is no longer visible in EXPLAIN. While the optimizer correctly knows timeframe is constant after filter pushdown, the user loses visibility into the actual file sort order.

I understand the goal is to make display match what the optimizer sees, but could we achieve the unification (e.g., removing the separate get_projected_output_ordering code path) while still preserving the original ordering list order and showing the full physical orderings? For example, using validated_output_ordering() with proper projection handling, without going through the full equivalence-class normalization for display purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants